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Abstract 

We consider the Shannon cipher system in a setting where the secret key is delivered 
to the legitimate receiver via a channel with limited capacity. For this setting, we 
characterize the achievable region in the space of three figures of merit: the security 
(measured in terms of the equivocation), the compressibility of the cryptogram, and the 
distortion associated with the reconstruction of the plaintext source. Although lossy 
reconstruction of the plaintext does not rule out the option that the (noisy) decryption 
key would differ, to a certain extent, from the encryption key, we show, nevertheless, 
that the best strategy is to strive for perfect match between the two keys, by applying 
reliable channel coding to the key bits, and to control the distortion solely via rate- 
distortion coding of the plaintext source before the encryption. In this sense, our result 
has a flavor similar to that of the classical source-channel separation theorem. Some 
variations and extensions of this model arc discussed as well. 



Index Terms: Shannon cipher system, key distribution, encryption, cryptography, 
source-channel separation. 

1 Introduction 

In the classical Shannon-theoretic approach to cryptology (see, e.g., |H],|I], JO] and refer- 
ences therein), two assumptions are traditionally made. The first is that the reconstruction 
of the decrypted plaintext source at the legitimate receiver is distortion-free (or almost 
distortion-free), and the second, which is related, is that the encryption and the decryption 
units share identical copies of the same key. Yamamoto 11. has relaxed the first assump- 
tion and extended the theory of Shannon secrecy systems into a rate-distortion scenario, 
allowing lossy reconstruction at the legtimate receiver. 
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In this correspondence, we examine also the second assumption. Referring to Fig. 1, 
we consider the case where the key is delivered to the legitimate receiver across a channel, 
which is cryptographically secure, but has limited capacity. For this setting, we characterize 
the achievable region in the space of three figures of merit: the security level (measured 
in terms of the equivocation), the compressibility of the cryptogram, and the distortion 
associated with the reconstruction of the plaintext source. 

One conceptually simple approach to handle such a situation would be to apply a reliable 
channel code to the encryption key bits, at a rate below the capacity of the channel, and 
thereby obtain, with high probability, the exact copy of the transmitted key bits at the 
receiver side. With this approach, however, the effective key rate, and hence the security 
level in terms of the equivocation, is limited by the channel capacity. The question that 
naturally arises at this point, especially in the lossy reconstruction scenario, is whether this 
is the best one can do. 

To sharpen the question, let us even assume that there is an unlimited reservoir of 
random key bits at the transmitter side, denoted K = (Kx, K2, ■ ■ ■), Ki € {0,1}, i = 
1,2, . . .. Then, perhaps one might wish to use more key rate (somewhat above capacity) 
for encryption and thereby increase the security of the cryptogram at the expense of some 
distortion at the reconstruction, due to the unavoidable mismatch between the encryption 
and decryption keys. To explore this point, let us consider a few speculative strategies. 

In the first strategy, one sends the key bits K across the channel uncodedly (assuming, 
for simplicity, that the channel has a binary input-output alphabet). Referring to Fig. 
1, let us take then N = n and Xi = Ki, i = 1,2,.... In this case, the noisy version 
of the key, obtained at the receiver side, K[ = Yi, is of course somewhat different from 
the original key. However, since only lossy reconstruction of the plaintext is required at 
the receiver side, it may seem conceivable that a reasonably small difference between the 
keys at both ends could be managable and thus cause a reasonably small distortion in 
the reconstruction. This is relatively easy to have if the encryption of the source precedes 
compression, as proposed in [2]: One may apply, for example, a certain memoryless mapping 
from the key bit stream into a stream of symbols Z\,Z%,. . . taking (two of the) values in the 
alphabet of plaintext source, U. Then assuming that U is a commutative group endowed 
with an addition operation (e.g., addition modulo the alphabet size), one can create the 
enctypted sequence U- = U{ © Zi, i = 1,2, .. . and then compress the block (U[, . . . , U' n ) 
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with (K[,...,K' n ) as side information at the receiver, using a Slepian-Wolf encoder [7] 
in the lossless case, or a Wyner-Ziv code [Oj in the lossy case. Assuming, for simplicity, 
lossless compression, then upon decompressing the source at the receiver side and obtaining 
(Ui,...,U n ) (which is with high probability equal to (U[, . . . ,U^)), one 'subtracts' the 
noisy version of the key and obtain (with high probability) the reconstruction Vj, = U[ Q Z^, 
i = 1, 2, . . ., where Z[ is the corresponding noisy version of Zj. Now, since ViQUi = ZiQZ[, 
for all i, then for a difference distortion measure d(Ui,Vi) = p(Vi Q U), the distortion 
between U and its reconstruction Vi is identical to the distortion between the original key 
Zi and its noisy version Z[. 

A somewhat more sophisticated version of this scheme generates Z\ , Zi , . . . from the 
key bits using a simulator of a certain (memoryless) process (see, e.g., jS] and references 
therein), and then applies a good source-channel code to encode (Z\, . . . ,Z n ) across the 
channel. The reconstructed version at the receiver side, Z[, Z%, . . ., would then have the 
minimum possible distortion relative to (Z%, . . . , Z n ), given by the distortion-rate function 
of {Zi} computed at the channel capacity, and therefore so would be also the distortion 
between {U} and {Vi}. Moreover, there is an additional degree of freedom with regard 
to the choice of the probability law of {Zi} for trading off between the security, which is 
given by the entropy rate of {Zi}, and the distortion, i.e., distortion-rate function of {Zi} 
computed at the channel capacity. 

Another solution strategy may be based on the following point: Note that for the purpose 
of reliable transmission and decoding of the key bits across the channel, the cryptogram 
(denoted by W m in Fig. 1), which is a function of these key bits as well, may serve as 
useful side information at the decoder, unless it is statistically independent of these bits. 
Thus, one would speculate that it might be wise to allow some dependence between W m 
and K and thus sacrifice some compression performance at the benefit gaining performance 
in communicating the key across the channel. Let us assume that the bits of the key string 
K m = (Ki, . . . , K m ) are XORed (added modulo 2) with the bits of the compressed version 
of the source. Then, if the compression algorithm is designed in such a way such the bits 
of the compressed version of U are not symmetric, then W m is correlated to K m , and 
so W m can be viewed as a noisy version of K m , which was transmitted uncodedly across 
a "parallel channel". In such a case, we can then think of the key bits as being encoded 
using a systematic code across the combined channel whose outputs are W m and Y n and 
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the effective rate of this code is smaller than that over the original channel depicted in Fig. 
1. Another way to look at this is the following: The key string K m can be compressed 
by a Slepian-Wolf encoder given W m (as side information at the decoder) before being 
channel coded, thus increasing the effective capacity by a factor given by the reciprocal of 
the conditional entropy of the key given the cryptogram. 

We show in this correspondence that none of the ideas raised in the last four paragraphs, 
nor any other creative idea one may have, can work better than the first strategy we 
mentioned earlier, which is the following: At the lower part of the encoder of Fig. 1 (the 
"key encoder"), use a good channel code at rate below capacity, whose role is to reliably 
transmit a certain amount of key bits. At the upper block of the encoder of Fig. 1, first 
compress U N by an optimal rate-distortion code to obtain NR(D) bits, where R(D) is the 
rate-distortion function of U N , and then encrypt the compressed bitstream with the same 
bits that are fed into the channel code. At the receiver, first decode the key bits from the 
channel output, and then use them to decrypt and decompress the source. 

The result on the optimality of this scheme has a flavor similar to that of the classical 
source-channel separation theorem in three aspects: (i) There is a complete decoupling 
between source coding (for U N ) and channel coding (for the key bits) from the operative 
point of view as well as from the viewpoint of code design (unlike in the other strategies 
described above), (ii) the best possible strategy of controlling the distortion is only via 
rate-distortion coding, and (iii) the necessary and sufficient condition for perfect secrecy is 
NR(D) < nC, which is of the same form as the source-channel separation theorem. 

The outline of this correspondence is as follows. In Section 2, we define notation con- 
ventions and give a formal definition of the problem. In Section 3, we state and prove the 
main result, and in Section 4, we discuss a few variations and extensions. 

2 Notation Conventions and Problem Definition 

We begin by establishing some notation conventions. Throughout this paper, scalar random 
variables (RV's) will be denoted by capital letters, their sample values will be denoted by 
the respective lower case letters, and their alphabets will be denoted by the respective 
calligraphic letters. A similar convention will apply to random vectors and their sample 
values, which will be denoted with same symbols superscripted by the dimension. Thus, 
for example, U N (N - positive integer) will denote a random iV-vector (U\, Un), and 
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u = (u±, ...,un) is a specific vector value in U , the N-th. Cartesian power of U. 

Sources and channels will be denoted generically by the letter P, subscripted by the 
name of the RV and its conditioning, if applicable, e.g., Pjj{u) is the probability function 
of U at the point U = u, Py\x(y\ x ) is the conditional probability of Y = y given X = x, 
and so on. Whenever clear from the context, these subscripts will be omitted. Information 
theoretic quantities like entropies and mutual informations will be denoted following the 
usual conventions of the Information Theory literature, e.g., H(U N ), I(X n ;Y n ), and so 
on. For single-letter information quantities (i.e., when n = 1 or N = 1), subscripts will be 
omitted, e.g., HiU 1 ) = H(U{) will be denoted by H(U), similarly, /(A^Y 1 ) = /(Ai;Y"i) 
will be denoted by I(X; Y), and so on. 

We now turn to the formal description of the model and the problem setting, as 
described in the Introduction, and referring to Fig. 1. A source Pjj, generates a se- 
quence of independent copies, Ui, U2, ... of a finite-alphabet RV, U & U, whose entropy is 
H(U) = — J2 u eU Pu( u ) l°g2 Pu(u)- At the same time and independently, a discrete memo- 
ryless channel (DMC) Py\x receives input symbols x\,X2, ■ ■ ■ with coordinates taking values 
in a finite alphabet X ', and produces output symbols 2/1,2/2, • • • with coordinates taking val- 
ues in a finite alphabet y, according to a conditional probability law given by the product 
of single-letter transition probabilities Yl t Py\x{yt\xt)- The relative rate between the oper- 
ation of the channel Py\x an d that of the source is A channel symbols per source symbol. 
This means that while the source generates a block of N symbols, say, U N = (Ui, . . . , Un), 
according to the above mentioned probability law, the channel conveys n = XN transmis- 
sions, 1 i.e., it receives a channel input block of length n, X n = (X\, . . . ,X n ), and outputs 
another block of the same length Y n = (Y\, . . . ,Y n ) according to the above described 
conditional probability law. Let C = maxp x I(X; Y) denote the channel capacity. 

In addition to the source Pjj and the channel Py\x-> Y e ^ another source, Pk, henceforth 

referred to as the key source, generates an infinite sequence of i.i.d. purely random bits, 

K = (Ki, K2, ■ ■ .), independently of the source U±, U2, ■ ■ ■■ The operation rate of the key 

source relative to the source Pjj (and the channel Py\x) w iH be immaterial, i.e., we will 

assume that the reservoir of key bits, for every finite period of time, is sufficiently large so 

that it is effectively unlimited. 

A block code for joint coding and encryption with parameters n and A = n/N, consists 
1 Without essential loss of generality, we will assume that XN is a positive integer. 
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of three mappings. The first mapping is the compressor-encrypter _/>/ : U x {0, 1}°° — > 
{0, l} m , where m = fiN, \i > being the compression rate. Upon receiving a source vector 
U N g jjN an j a jjgy. se q Uence g {0,1}°°, this mapping produces a binary cryptogram 
w m £ {0, l} m according to u> m = fj^(u N ,k). The second mapping is the key-encoder gjy : 
{0, 1}°° — > <Y n , which produces a channel input vector x n according to x n = g n {k). Finally, 
the third mapping is the decoder Hn ■ {0, l} m x y n — > V^, where V is the reproduction 
alphabet. Upon receiving a cryptogram tt; m and a channel output vector y n , the decoder 
produces a reproduction vector according to v N = hN(w m ,y n ). 

Let d :U x V — > 1R + denote a single-letter distortion measure between source symbols 
and the reproduction symbols, and let the distortion between the vectors, u N G and 
u N € V^, be defined additively across the corresponding components, as usual. We will 
assume that d is bounded, i.e., d max = max Ujl , d(u, v) < oo. Let R(D) denote the rate- 
distortion function of the source Pjj with respect to d. 

An (n, A, D, R c , h) code for joint coding and encryption is a block code with parameters 
n and A, as above, which also satisfies the following requirements: 

1. The expected distortion between the source and the reproduction satisfies 

N 

^ Ed(Ui, Vi) < ND. (1) 

2. The rate of the cryptogram satisfies 

Tfl 

H = jj<Rc (2) 

3. The equivocation of the source satisfies 

H(U N \W m ) > Nh. (3) 

For a given A, a triple (D,R c ,h) is said to be achievable if for every e > 0, there is a 
sufficiently large n for which (n, A, D + e, R c + e, h — e) block codes for joint coding and 
encryption exist. Our purpose, in this paper is to characterize the achievable region of 
triples (D, R c , h), i.e., the set of all achievable triples (D, R c , h). 

3 Main Result 

Our main coding theorem is the following: 
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Theorem 1 A triple (D, R c , h) is achievable if and only if the following conditions are both 
satisfied: 

(a) h < h*{D) = H{U) - [R{D) - XC] + , where [o]+ = max{a,0}. 

(b) R c > R(D). 

It should be noted that for a given D, there is no conflict (or interaction) between 
maximizing h and minimizing R c : As is well known, R c is lower bounded by R(D) even if 
there is no security requirement, but on the other hand, even in the presence of the highest 
possible security level requirement, of h*(D), the compression ratio R(D) is still achievable 
By the same token, and as will be evident from the proof, h is upper bounded by 
h*(D) even if there is no compressibility requirement, yet it remains achievable even if the 
compression ratio of R(D) is required. 

The remaining part of this section is devoted to the proof of Theorem 1. 

Proof. We begin with the converse part. Let an (n, A, D + e, R c + e, h — e) block code for 

joint coding and encryption be given. Now, since 

h*(D) = H(U) - [R(D) - AC]+ = mm{H(U),H(U) - R(D) + AC}, (4) 

we have to prove that both h < H(U) and h < H(U) - R(D) + AC. The first bound is 
trivial since 

N(h-e) <H(U N \W m ) <H{U N ) = NH(U), (5) 

where the first inequality is by definition of an (n, A, D + e, R c + e, h — e) block code for 
joint coding and encryption. The inequality h < H(U) now follows from the arbitrariness 
of e > 0. As for the second bound, we have 

N(h-e) < H{U N \W m ) 

= H{U N \W m ,Y n )+I(U N ;Y n \W m ) 

= H{U N \W m , Y n , V N ) + H{Y n \W m ) - H(Y n \W m , U N ) 

< H{U N \V N ) + H(Y n )-H(Y n \W m ,U N ,X n ) 
= H{U N )-I{U N ;V N )+H{Y n )-H{Y n \X n ) 

< NH(U)- NR(D + e) + I(X n ;Y n ) 

< N[H{U) - R{D + e)]+nC, (6) 
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where the second line is a standard identity, the third is because V N is a function of 
(W m ,Y n ), the fourth is because conditioning reduces entropy (used thrice), the fifth is 
due to the fact that (U N ,W m ) — > X n — > Y n is a Markov chain, the sixth is due to the 
memorylessness of the source and the fact that R(D) = min{/(L r ; V) : Ed(U, V) < D} 
(which is also convex) , and the last line is due to the memorylessness of the channel and the 
fact that C = maxp x I(X;Y). Again, dividing by N, and using the arbitrariness of e > 
as well as the continuity of R(D), we get the second bound on h, and so, the necessity of 
condition (a) follows. 

The proof of the necessity of condition (b) is similar to the proof of the converse to 
the ordinary rate-distortion coding theorem, except that the presence of Y n (which is 
independent of U N ) at the decoder has to be taken into account: 

N{R c + e) > H(W m ) 

> H(W m \Y n ) 

> I(U N ;W m \Y n ) 

N 

= J2i H ( U i\ ui ~^ Yn ) -H{U i \U i - 1 ,W m ,Y n )] 
i=i 

N 



> Y,i H ( u i)- H ( u i\ wm , Yn )] 
i=i 

N 

= ^/(tW\Y n ) 



i=i 

N 

1=1 

> NR(D + e), (7) 

where the first line is by definition of an (n, A, D + e, R c + e, h — e) block code for joint coding 
and encryption, the second, third, fourth and sixth are standard identities and inequalities, 
the fifth is based on the memorylessness of the source and its independence of Y n , the 
seventh is based on the data processing inequality and the fact that Vi is a function of 
(W m , Y N ), and the last inequality is again by the informational definition of R(D) and its 
convexity. Taking again e to zero, this completes the proof of the converse part of Theorem 
1. 

As for the direct part, consider the following (conceptually) simple coding scheme. For a 
given arbirarily small e > 0, let £ = min{n(C — e), N[R(D) + e]} and let x n = <7iv(&i, ■ ■ ■ , ki) 



be given by a channel code whose error probability is below some 5 > 0, provided that n 
is sufficiently large. Since the rate of this code never exceeds C — e, such a channel code 
exists by the classical channel coding theorem. As for f^, first apply a rate-distortion code 
for U N , whose rate is R c = R(D) + e, and then encrypt £ of the resulting m = N[R(D) + e] 
bits by (fci,..., kg) (using the ordinary bit-by-bit XOR). As for the equivocation, we have 

H(U N \W m ) = H(U N ) - I(U N ;W m ) 

= NH{U)-H{W m )+H{W m \U N ) 

> NH(U)-N[R(D) + e] + H(W m \U N ) 
= NH(U) - N[R(D) + e}+£ 

= NH(U) - N[R{D) + e] + min{n(C - e), N[R(D) + e]} 

> N(H(U)-[R(D)-XC}+-2emax{l,X}), (8) 

where the first inequality follows from the fact that the rate-distortion code is at rate 
R(D) + e, and the following equality is due to the fact that I bits of the compressed bit 
string are encrypted. At the decoder, first, the £ key bits (k±, . . . , kg) are decoded, and then 
the decoded key bits (k\, . . . , kg) are used to decrypt w m and then use the rate-distortion 
decoder to produce v N . With probability at least 1 — S, the decoded key bits (k\, . . . , kg) 
agree with the original ones (k±, . . . , kg) and then w rn is decrypted correctly to produce 
the appropriate reproduction vector v N within distortion D. At the event of erroneous 
decoding of (fei, . . . , kg), the distortion can only be bounded by d max , but this should be 
weighed by the probability of error, which is upper bounded by 5, and hence contributes 
only an arbitrarily small additional distortion. This completes the proof of Theorem 1. 

4 Discussion 

In this section, we discuss a few variations and extensions of the model considered. 
4.1 Source-Channel Separation 

We have already mentioned in the Introduction that Theorem 1 has the spirit of a separation 
theorem, from several points of view. Among them is the immediate observation that 
perfect security (in the sense that h = H(U)) can be achieved if and only if R(D) < AC, 
an inequality of the very same form as that of the classical joint source-channel separation 
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theorem. In this context, we should also point out that it is straightforward to extend 
our setup to a situation of ordinary joint source-channel coding, corresponding to the case 
where the cryptogram w' m needs to be transmitted via a noisy channel, independent of the 
key distribution channel. The only modification to Theorem 1 would be to replace R c in 
part (b) by the capacity of the main channel. Thus, we have a two-fold separation theorem. 

4.2 Simple Coding and Decoding in Special Cases 

Suppose that the compressibility of the cryptogram is not an issue, in other words, R c 
is immaterial and we are only interested in the tradeoff between D and h. In this case, 
there exist situations where optimal performance can be achieved using very simple coding 
systems, similiarly to the well-known special cases, where this can be done in the context 
of classical joint source-channel coding (see, e.g., P). Let us suppose, for example, that 
U = X = y = V, A = 1, and that the distortion measure d is a difference distortion measure, 
i.e., d(u,v) = p(v Q u) for a well-defined subtraction operation (cf. the corresponding 
discussion in the Introduction). Suppose further that P\j, which is the uniform distribution 
over tl, is the capacity-achieving input for the channel Py\x an d that Py\x m turn achieves 
the rate-distortion function of Pjj at distortion level D, i.e., R(D) = C. For example, Pjj 
may be the BSS and Py\x ma y be the BSC with crossover probability D. Then one can 
easily achieve perfect secrecy, h = H{U) = log \IA\, at the minimum possible distortion, i.e., 
D = i? _1 (C) (.R _1 (-) being the distortion-rate function of U) in the following manner, which 
is similar to one of the strategies discussed in the Introduction: Let Z\ % Zi,... be a simulated 
memoryless process, generated from K, with the same (uniform) distribution as U±, U2, ■ ■ ■■ 
Note that when \U\ is a power of 2, this is very easy to implement since U is uniform. For 
encryption, let W { = Ui® Z { . Then, obviously, H{U N \W N ) = NH{U) = N\og \U\ since U { 
and Zi are uniformly distributed and independent, and so, perfect secrecy is guaranteed. As 
for the key transmission, let us send {Zi} uncodedly across the channel, i.e., X% = Z%. Since 
Py\x achieves the rate-distortion of {Ui}, and hence also that of {Zi}, then the channel 
output {Yi} will have distortion D relative to {Zi}. At the the decoder, we simply apply 
the equation Vi = WiQYi. Since V t G Ui = Zi Q Yi, then Ed(Ui, Vi) = Ep(Vi Q Ui) = 
Ep(Zi Yi) = D = ii _1 (C). Thus, optimal performance is achieved using a very simple 
system once we have an independent copy of {Ui} as a key. 
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4.3 A Wider Class of Joint Encoders 

Another point regarding the case where R c is immaterial, is the following: It turns out that 
part (a) of Theorem 1 (both the necessity and the sufficiency) would still apply even if we 
broaden the scope to a wider class of encoders that allow both x n and w m to depend on 
both u N and k. This means that is redefined as : {0, 1}°° x U N — > X n , and so, 
x n = giy(k, u ). The direct part would use the same scheme as before. As for the converse 
part, note that eq. © is general enough to allow this setup. The conclusion then is that 
if only D and h are the figure of merits of interest, then a good key code gjy need not 
really use its accessibility to u . The situation becomes somewhat more involved when the 
compressibility is brought back into the picture, because then the encoder has two paths 
through which it can pass descriptions of the source. Note that if R(D) < AC, the encoder 
can transmit the entire description via the key distribution channel, without using the main 
channel at all, thus R c = 0. 

4.4 Securing the Reproduction Sequence 

Consider the case where one is interested not only to guarantee a certain security level h 
with regard to the original source, but also to guarantee a security level h' with regard to the 
reproduction V . This makes sense because it is actually V N the part of the information 
that is communicated to the legitimate receiver and thus has to be protected (see also 
[3])- To derive necessary conditions for securing V N at level h', we consider two chains of 
inequalities. The first is the following: 

N(h!-e) < H{V N \W m ) 

< H{V N ) 

N 

< E^) 

i=i 

= NH(V\J) 



< NH(V) (9) 



where J is random variable taking values in the set {1, . . . , N} with the uniform distribution 
and V = Vj. Thus, our first necessary condition for security level h! is that there exists a 
random variable V with alphabet V (jointly distributed with U) such that h! < H(V). The 
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second chain of inequalities is as follows: 

N(h'-e) < H(V N \W m ) 

= H(V N \W m ,Y n ) +I{Y n -V N \W m ) 

= I{Y n -V N \W m ) 

< H{Y n \W m ) 

< H(Y n ) 

< 

i=l 

= nH(Y\J') 

< nH(Y), (10) 

where J' is random variable taking values in the set {1, . . . , n} with the uniform distribution 
and Y = Yj>. The second equality is due to the fact that V N is a function of (W m ,Y n ) 
and so H(V N \W m ,Y n ) = 0. Thus, another necessary condition is the existence of random 
variable Y at the output of the channel Py\x (which means the existence of a channel input 
variable X that induces Y via Py\x) such that h! < XH(Y). The combination of the two 
necessary conditions then gives h! < min{H(V), XH(Y)}. 

A restatement of the necessity part of Theorem 1 would then be the following: If 
(D,R c ,h,h r ) is achievable then there exist a channel Py\u and a source Px such that 
the following conditions are simultaneously satisfied: 

(a) h < H(U) - [I(U; V) - XI(X; Y)]+, 

(b) h! < mm{H(V),XH(Y)}, 

(c) R C >I(U;V), 

(d) D > Ed{U,V). 

Note that in contrast to Theorem 1, we are no longer taking the minimum of I(U;V) to 
obtain R(D), nor do we take the maximum of I(X; Y) to obtain C. The reason is that such 
optimizations might be in partial conflict with the need to achieve large values of H(V) 
and H(Y) in order to meet condition (b). Thus, there are more complicated compromises 
in the choice of X and V when the tradeoff involves the additional parameter h! . 
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The achievability of this set of conditions remains open in general. However, for the 
special case where the channel Py\x is deterministic, that is, Y is a deterministic function of 
X, and so I(X;Y) = H(Y), the achievability scheme is essentially the same as before (but 
with general choices of Px and Py\u) as l° n g as the required security h! does not exceed the 
level mm{I(U;V),XI(X;Y)} = mm{I(U;V), XH(Y)}. If it is higher, and if XH(Y) exceeds 
I(U; V) the additional key bits beyond NI(U; V) (but not more than H(V)) conveyed by the 
channel can be used to control the (secret) choice of the rate-distortion codebook among up 
to 2 NH ( V \ U > distinct codebooks that exist (cf. 050) and thereby achieve the extra security 
needed with regard to V N . 

Note that here, the separation principle no longer holds as before, in the strong meaning 
of this term, because now, the choice of Px and Py\u involves compromises where there is 
an interaction between the source coding of U N and the channel coding of K. 

4.5 Feedback 

Finally, consider the scenario of the previous subsection, where in addition, there is noiseless 
feedback from the channel output to the transmitter. In this case, it is clear too how to 
secure V N to the level of h! = min{H(y), XH(Y)}, and it is also clear that this value cannot 
be further improved upon. Here, the encoder and the decoder simply share identical copies 
of {Yi} as a common key at both ends, and there is no longer use for the original key, {K{\. 
By the same token, in this case, the equivocation of U can be enhanced to the level of 
h = H(U) — [I{U ; V) — XH(Y)] + , but not more. Thus, although feedback does not increase 
the capacity of a DMC, it certainly improves its effectiveness when this channel serves for 
key delivery. 

4.6 Continuous Alphabets 

In our derivations this far, we have limited ourselves to finite alphabet sources and channels, 
primarily for reasons of convenience. Theorem 1 extends quite straightforwardly to the 
continuous alphabet case as well. One comment is in order, however: In the continuous 
alphabet case, it no longer makes sense to measure equivocation in terms of conditional 
(differential) entropy, which can be negative. It still makes sense, nonetheless, to define 
it by the complementary quantity - the mutual information, I(W m ;U ), which is always 
non-negative. Thus, part (a) of Theorem 1 would be restated to assert that [R(D) — AC]+ 
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is an achievable lower bound to I(W m ; U )/N . 
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Figure 1: A cipher system with capacity-limited key distribution. 
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